(Compiler) Optimization of fp expressions

Goal of this exercise is to learn how to control and verify the optimizations performed by a compiler on fp expressions

  1. run paranoia with different optimization flags: what's  Kahan opinion?
  2. Write simple kernels and inspect the generated object code: identify the various operation at assembler level
  3.  compile code with different optimization options and verify the generated code
  4. Learn how to avoid common pitfalls
  5. Verify the gain in performance when taking the "correct" approach
  6. Apply all this to the minimization exemple

Code

in exercises:
kernels.cc

Hints

// setenv OPTFLAGS "-fassociative-math -freciprocal-math -fno-math-errno -fno-signed-zeros -fno-trapping-math -ffinite-math-only"
// use extern to avoid inlining (and compile with -fPIC)
// "volatile" can be used to avoid eager optimization
// compile paranoia with -DVOLATILE and/or -DNOSUB


pfmon --long-smpl-period=5000 --resolve-addresses --smpl-per-function --smpl-show-top=20 ./a.out
pfmon -e UNHALTED_CORE_CYCLES,ARITH:CYCLES_DIV_BUSY,SSEX_UOPS_RETIRED:SCALAR_SINGLE,SSEX_UOPS_RETIRED:PACKED_SINGLE ./a.out k


objdump -S -r -C --no-show-raw-insn -w test.o | less
(on MacOS: otool -t -v -V -X test.o | c++filt | less)


References

gcc manual
man gcc